Newest 'deep-learning reinforcement-learning' Questions

3votes

2answers

44views

Required background for thorough understanding of Causal ML research papers?

I'm interested in pursuing research in the intersection of causal inference and machine learning, particularly on causal discovery and causal representation learning. Through my exploration so far, I ...

Harsh Shrivastava

31

asked Apr 5 at 15:08

2votes

1answer

84views

How to deal with actions that complete in multiple steps (delayed reward) in reinforcement learning?

I have been exploring RL and using DQN to train an agent for a problem where i have two possible actions. But one of the action is supposed to complete over multiple steps while other one is ...

m101

23

asked Mar 18 at 16:29

1vote

1answer

51views

Can I use minimax tree search over Q-values?

I'm trying to build a chess bot, and I'm trying to figure out if I can use Q-Values in a search tree by creating new nodes according to the number of possible moves, each with the corresponding Q-...

Leonhard Piff

21

asked Feb 18 at 23:31

2votes

2answers

425views

What is the reinforcement learning reward function for reasoning in DeepSeek-R1

DeepSeek-R1 reports to have applied the Group Relative Policy Optimization where it rewards "accuracy". How is this accuracy measured for theorem proving? A proof can be stated in myriad ...

Hans

123

asked Jan 25 at 9:54

0votes

0answers

27views

Is my MAML implementation correct?

im trying to implement the MAML algorithm in the Reinforcement Learning domain but am not achieving fast adaptation to my validation tasks. I assume that something may be wrong with my meta loss ...

Mark Taylor

1

asked Dec 17, 2024 at 17:37

0votes

0answers

39views

What’s the State of the Art in Traffic Light Control Using Reinforcement Learning? Ideas for Master’s Thesis?

I’m currently planning my Master’s thesis and I’m interested in the application of RL to traffic light control systems. I’ve come across research using different algorithms. However, I wanted to know: ...

Baki

19

asked Dec 17, 2024 at 14:09

1vote

1answer

53views

What type of noise should I use with softmax activation?

I'm implementing a RL agent that navigates a graph. I'm using a softmax activation in the final layer of the actor network to model the action probabilities. To encourage exploration during training, ...

Baki

19

asked Dec 4, 2024 at 10:21

0votes

1answer

22views

Unidentifiable flipped sign in policy gradient

Today I was building a VPG agent for a test and noticed it was getting worse not better over time so I flipped the reward during the training loop and lo and behold it learned. so obviously I started ...

Leonhard Piff

21

asked Nov 27, 2024 at 14:52

1vote

1answer

77views

How do I correctly apply action masking during DDPG training in Python?

I'm implementing the Deep Deterministic Policy Gradient (DDPG) algorithm in PyTorch, and I'm facing issues with applying an action mask during the training process. Currently, I apply an action mask ...

Oriol Feliu

11

asked Nov 13, 2024 at 9:34

0votes

3answers

56views

Why does TD3/DDPG use − 𝐸 [ 𝑄 ( 𝑠 , 𝜋 ( 𝑠 ) ) ] −E[Q(s,π(s))] as the policy loss without causing Q-values to go to infinity?

I tried to understand why TD3/DDPG use a policy loss of −E[Q(s,π(s))], which should make the policy maximize Q-values. I expected this to push Q-values to infinity over time, as there’s no explicit ...

Omar

19

asked Nov 3, 2024 at 9:38

3votes

1answer

287views

Can two different non-optimal policies have the same value functions?

According Sutton and Barto second edition, page 79, policy improvement must give a better policy except when the policy is already optimal. This means that if two policies have the same value function ...

User1983

33

asked Oct 5, 2024 at 17:35

1vote

1answer

135views

Is deep learning suitable/preferable for string similarity detection and application automation? If so, which type?

newbie here. I have developed an app that basically does: Perform OCR, check if words are contained in the resulting text and then perform an action. If no words are detected from the given list, ...

zaxunobi

111

asked Jun 20, 2024 at 7:16

0votes

1answer

142views

Is reinforcement learning suitable for application automation?

I have basically automatised the use of an app through the use of OCR and computer vision. So basically when a word or an image is detected it will perform a certain action. When that action is ...

zaxunobi

111

asked May 30, 2024 at 20:45

1vote

0answers

38views

Why completely two different algorithms are being used in Deep Q Learning?

I'm a new student in reinforcement learning. Recently, I've been studying about different algorithms of RL. But I'm quite surprized that there are some algorithms which are named as "same" ...

Jahid Chowdhury Choton

41

asked May 8, 2024 at 15:19

1vote

0answers

25views

Enhancing Generalization in DRL Agents in Static Data Environments

Context: I'm working with a deep reinforcement learning (DRL) agent in a market-like environment where its actions do not affect the environment. The environment uses historical data up to a certain ...

ElonMuskofBadIdeas

111

asked Jan 6, 2024 at 11:24

Stack Exchange Network

All Questions

Required background for thorough understanding of Causal ML research papers?

How to deal with actions that complete in multiple steps (delayed reward) in reinforcement learning?

Can I use minimax tree search over Q-values?

What is the reinforcement learning reward function for reasoning in DeepSeek-R1

Is my MAML implementation correct?

What’s the State of the Art in Traffic Light Control Using Reinforcement Learning? Ideas for Master’s Thesis?

What type of noise should I use with softmax activation?

Unidentifiable flipped sign in policy gradient

How do I correctly apply action masking during DDPG training in Python?

Why does TD3/DDPG use − 𝐸 [ 𝑄 ( 𝑠 , 𝜋 ( 𝑠 ) ) ] −E[Q(s,π(s))] as the policy loss without causing Q-values to go to infinity?

Can two different non-optimal policies have the same value functions?

Is deep learning suitable/preferable for string similarity detection and application automation? If so, which type?

Is reinforcement learning suitable for application automation?

Why completely two different algorithms are being used in Deep Q Learning?

Enhancing Generalization in DRL Agents in Static Data Environments

Hot Network Questions

All Questions

Related Tags